--- title: Exploring Object Detection using Icevision w/ FastAI keywords: fastai sidebar: home_sidebar nb_path: "20_subcoco_ivf.ipynb" ---
The full COCO Dataset is huge (~50GB?). For my self education exploring object detection, with the intention of using pretrained model in transfer learning, it is not practical to deal with dataset this big as my first project. Luckily, the kind folks at FastAI have prepared some convenient subsets, the medium size 3GB https://s3.amazonaws.com/fast-ai-coco/coco_sample.tgz seems like a good candidate. The 800KB "http://files.fast.ai/data/examples/coco_tiny.tgz" on the other hand seems way too small, thus may not have enough data for adequate training.
If playing with the tiny Coco subset, use these values
froot = "coco_tiny"
fname = f"{froot}.tgz"
url = f"http://files.fast.ai/data/examples/{fname}"
json_fname = datadir/froot/'train.json'
img_dir = datadir/froot/'train'
train_json['categories'], train_json['images'][0], [a for a in train_json['annotations'] if a['image_id']==train_json['images'][0]['id'] ]
print(
f"Categories {stats.num_cats}, Images {stats.num_imgs}, Boxes {stats.num_bboxs}, avg (w,h) {(stats.avg_width, stats.avg_height)}"
f"avg cats/img {stats.avg_ncats_per_img:.1f}, avg boxs/img {stats.avg_nboxs_per_img:.1f}, avg boxs/cat {stats.avg_nboxs_per_cat:.1f}.")
print(f"Image means by channel {stats.chn_means}, std.dev by channel {stats.chn_stds}")
stats.lbl2name, stats.lbl2cat, stats.cat2lbl, stats.lbl2name
To prevent bounding boxes being too close to margin or too small, especially after augmentation which performs transformations. I would set min_margin_ratio = 0.05, min_width_height_ratio = 0.05.
However, IceVision 2.0 now has autofix which should address these issues, it does take a long time to run though...
class_map = ClassMap(list(stats.lbl2name.values()))
show_records(train_records[:4], ncols=2, class_map=class_map, show=True)
inf_tfms, learn, backbone_name = gen_transforms_and_learner(img_size=512, bs=4, acc_cycs=8)
I have experimented with other models available out of box in IceVision, but efficientdet works the best. You can replace backbone_name, backbone, model, with the following values to test.
backbone_name
backbone
model
learn.lr_find()
run_training(learn, head_runs=1, full_runs=0)
infer_ds = Dataset(valid_records[:4], inf_tfms)
infer_dl = efficientdet.infer_dl(infer_ds, batch_size=4, shuffle=True)
samples, preds = efficientdet.predict_dl(learn.model, infer_dl)
imgs = [sample["img"] for sample in samples]
show_preds(
imgs=imgs[:4],
preds=preds[:4],
class_map=class_map,
denormalize_fn=denormalize_imagenet,
ncols=1,
figsize=(36,27)
)
As you can see, training after only 2 epochs does not produce a usable model.
final_saved_model_fpath = f"models/{backbone_name}-subcoco-final.pth"
save_final(final_saved_model_fpath)
pretrained_model = efficientdet.model(model_name=backbone_name, num_classes=len(stats.lbl2name), img_size=512)
pretrained_model.load_state_dict(torch.load(final_saved_model_fpath))
Run Inference with first 4 of the validation image...
infer_ds = Dataset(valid_records[128:132], inf_tfms)
infer_dl = efficientdet.infer_dl(infer_ds, batch_size=4, shuffle=False)
samples, preds = efficientdet.predict_dl(pretrained_model.cuda(), infer_dl)
imgs = [sample["img"] for sample in samples]
show_preds(
imgs=imgs[:4],
preds=preds[:4],
class_map=class_map,
denormalize_fn=denormalize_imagenet,
ncols=1,
figsize=(36,27)
)